Multi-Armed Bandits for Adaptive Constraint Propagation
نویسندگان
چکیده
Adaptive constraint propagation has recently received a great attention. It allows a constraint solver to exploit various levels of propagation during search, and in many cases it shows better performance than static/predefined. The crucial point is to make adaptive constraint propagation automatic, so that no expert knowledge or parameter specification is required. In this work, we propose a simple learning technique, based on multiarmed bandits, that allows to automatically select among several levels of propagation during search. Our technique enables the combination of any number of levels of propagation whereas existing techniques are only defined for pairs. An experimental evaluation demonstrates that the proposed technique results in a more efficient and stable solver.
منابع مشابه
Monotone multi-armed bandit allocations
We present a novel angle for multi-armed bandits (henceforth abbreviated MAB) which follows from the recent work on MAB mechanisms (Babaioff et al., 2009; Devanur and Kakade, 2009; Babaioff et al., 2010). The new problem is, essentially, about designing MAB algorithms under an additional constraint motivated by their application to MAB mechanisms. This note is self-contained, although some fami...
متن کاملThompson Sampling for Contextual Bandits with Linear Payoffs
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the stateof-the-art methods. However, many questions regarding its theoretical performance remained open. In this paper, we d...
متن کاملModal Bandits
Analyses of multi-armed bandits primarily presume that the value of an arm is its expected reward. We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions.
متن کاملOn Adaptive Estimation for Dynamic Bernoulli Bandits
The multi-armed bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. It is concerned with maximising the total rewards for a gambler by sequentially pulling an arm from a multi-armed slot machine where each arm is associated with a reward distribution. In static MABs, the reward distributions do not change over time, while in dynamic MABs, each arm’s reward distrib...
متن کاملLecture 14 : Bandits with Budget Constraints
1 Problem definition In the regular Multi-armed Bandit problem, we pull an arm It at each round t and a reward rt is collected, which depend on the bandit It. Now suppose that each round after pulling the arm, a cost ct is also incurred. When the total cost till time t surpass a given level B, the algorithm stops. This setting of problem is called Bandits with constraints. The formal descriptio...
متن کامل